Bullshit Benchmark - Snapshot

At a Glance

Leaderboard

# Model Org Reasoning Avg Green Red 2/1/0 Rows Errors

All Rows

Sample Model QID Technique Score Band Status

Selected Row Detail

Select a row above to inspect full response and judge notes.

Question


        

Nonsensical Element


        

Model Response


      

Judges


      

Raw Row JSON